CSE314 Group Project
  • Project Overview
  • Data Overview
  • Data Preprocess
  • Data Visualization
    • About the data
    • Exploratory Data Analysis (EDA)
    • Model Examination
  • Demo Feature Selection
  • Database
  • Model - Logistics
  • Model - Random Forest
  • Model Examination
  • Kaggle Competition
CSE314 Group Project
  • »
  • Data Visualization

Data Visualization¶

About the data¶

Column name Column meaning Example value
raw_row_number An number used to join clean data back to the raw data 38299
date_time The date and time of the stop, in YYYY-MM-DD HH:MM format. "2017-02-02 20:15"
location The freeform text of the location. Occasionally, this represents the concatenation of several raw fields, i.e. street_number, street_name "248 Stockton Rd."
county_name County name where provided "Allegheny County"
subject_age The age of the stopped subject. When date of birth is given, we calculate the age based on the stop date. Values outside the range of 10-110 are coerced to NA. 54.23
subject_race The race of the stopped subject. Values are standardized to white, black, hispanic, asian/pacific islander, and other/unknown "hispanic"
subject_sex The recorded sex of the stopped subject. "female"
officer_id_hash A unique hash of the officer id used to identify individual officers within a location. This is usually just a hash of the provided officer ID or badge number. "a888fdc120"
department_name Name of department or subdivision to which officer has been assigned. "Charlotte-Mecklenburg Police Department"
type Type of stop: vehicular or pedestrian. "vehicular"
arrest_made Indicates whether an arrest made. FALSE
citation_issued Indicates whether a citation was issued. TRUE
warning_issued Indicates whether a warning was issued. TRUE
outcome The strictest action taken among arrest, citation, warning, and summons. "citation"
frisk_performed Indicates whether a frisk was performed. This is technically different from a search, but departments will sometimes include frisks as a search type. TRUE
search_conducted Indicates whether any type of search was conducted, i.e. driver, passenger, vehicle. Frisks are excluded where the department has provided resolution on both. TRUE
search_person Indicates whether a search of a person has occurred. This is only defined when search_conducted is TRUE. TRUE
search_vehicle Indicates whether a search of a vehicle has occurred. This is only defined when search_conducted is TRUE. TRUE
search_basis This provides the reason for the search where provided and is categorized into k9, plain view, consent, probable cause, and other. If a serach occurred but the reason wasn't listed, we assume probable cause. "consent"
reason_for_stop A freeform text field indicating the reason for the stop where provided. "EQUIPMENT MALFUNCTION"
raw_Ethnicity the raw data's hispanic/non-hispanic column ['H', 'N']
raw_Race the raw data's race column ['W', 'B', 'A', 'U', 'I']
raw_action_description the raw data's policy stop outcome ['Citation Issued', 'Verbal Warning', 'Written Warning','On-View Arrest', 'No Action Taken']

Exploratory Data Analysis (EDA)¶

In [11]:
Copied!
from pandas_profiling import ProfileReport
ProfileReport(train)
from pandas_profiling import ProfileReport ProfileReport(train)
Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]
Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]
Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]
Out[11]:

Model Examination¶

In [12]:
Copied!
#create input dataframe by your own
Models = ["F KNN","F LR ","F DT", "S KNN","S LR ","S DT"]
R2 = [0.635,0.723,0.639,0.565, 0.703, 0.580]
Acc = [0.7,0.8,0.4,0.63,0.77,0.35]

modeln = pd.DataFrame({
    "Model": Models,
    "R2": R2,
    "Accruacy": Acc,
})    
modeln.head()
#create input dataframe by your own Models = ["F KNN","F LR ","F DT", "S KNN","S LR ","S DT"] R2 = [0.635,0.723,0.639,0.565, 0.703, 0.580] Acc = [0.7,0.8,0.4,0.63,0.77,0.35] modeln = pd.DataFrame({ "Model": Models, "R2": R2, "Accruacy": Acc, }) modeln.head()
Out[12]:
Model R2 Accruacy
0 F KNN 0.635 0.70
1 F LR 0.723 0.80
2 F DT 0.639 0.40
3 S KNN 0.565 0.63
4 S LR 0.703 0.77
In [13]:
Copied!
#modified from source https://www.youtube.com/watch?v=FuJOsZgo4nU
from jupyter_dash import JupyterDash
#-------------------------------------------------------------------------------------

app = JupyterDash(__name__)

#-------------------------------------------------------------------------------------
app.layout = html.Div([

        html.Div([
            html.Pre(children= "Model Comparison",
            style={"text-align": "center", "font-size":"100%", "color":"black"})
        ]),

        html.Div([
            html.Label(['X-axis:'],style={'font-weight': 'bold'}),
            dcc.RadioItems(
                id='xaxis_raditem',
                options=[
                         {'label': 'Models', 'value': 'Model'},
#                          {'label': 'data', 'value': 'Data'}, # add more x-axis
                    #      {'label': 'data', 'value': 'Data'}, # add more x-axis
#                          {'label': 'data', 'value': 'Data'}, # add more x-axis
#                          {'label': 'data', 'value': 'Data'}, # add more x-axis
#                          {'label': 'data', 'value': 'Data'}, # add more x-axis
#                          {'label': 'data', 'value': 'Data'}, # add more x-axis

                ],
                value='Model',
                style={"width": "50%"}
            ),
        ]),

        html.Div([
            html.Br(),
            html.Label(['Y-axis:'], style={'font-weight': 'bold'}),
            dcc.RadioItems(
                id='yaxis_raditem',
                options=[
                         {'label': 'R2', 'value': 'R2'},
                         {'label': 'Accuracy', 'value': 'Accruacy'},
                    #      {'label': 'data', 'value': 'Data'}, # add more y-axis
#                          {'label': 'data', 'value': 'Data'}, # add more y-axis
#                          {'label': 'data', 'value': 'Data'}, # add more y-axis
#                          {'label': 'data', 'value': 'Data'}, # add more y-axis
#                          {'label': 'data', 'value': 'Data'}, # add more y-axis
#                          {'label': 'data', 'value': 'Data'}, # add more y-axis

                ],
                value='Accruacy',
                style={"width": "50%"}
            ),
        ]),

    html.Div([
        dcc.Graph(id='the_graph')
    ]),

])

#-------------------------------------------------------------------------------------
@app.callback(
    Output(component_id='the_graph', component_property='figure'),
    [Input(component_id='xaxis_raditem', component_property='value'),
     Input(component_id='yaxis_raditem', component_property='value')]
)

def update_graph(x_axis, y_axis):

    dff = modeln
    # print(dff[[x_axis,y_axis]][:1])

    barchart=px.bar(
            data_frame=dff,
            x=x_axis,
            y=y_axis,
            title=y_axis+': by '+x_axis,
            # facet_col='Borough',
            # color='Borough',
            # barmode='group',
            )

    barchart.update_layout(xaxis={'categoryorder':'total ascending'},
                           title={'xanchor':'center', 'yanchor': 'top', 'y':0.9,'x':0.5,})

    return (barchart)

if __name__ == '__main__':
    app.run_server(mode='inline')
#modified from source https://www.youtube.com/watch?v=FuJOsZgo4nU from jupyter_dash import JupyterDash #------------------------------------------------------------------------------------- app = JupyterDash(__name__) #------------------------------------------------------------------------------------- app.layout = html.Div([ html.Div([ html.Pre(children= "Model Comparison", style={"text-align": "center", "font-size":"100%", "color":"black"}) ]), html.Div([ html.Label(['X-axis:'],style={'font-weight': 'bold'}), dcc.RadioItems( id='xaxis_raditem', options=[ {'label': 'Models', 'value': 'Model'}, # {'label': 'data', 'value': 'Data'}, # add more x-axis # {'label': 'data', 'value': 'Data'}, # add more x-axis # {'label': 'data', 'value': 'Data'}, # add more x-axis # {'label': 'data', 'value': 'Data'}, # add more x-axis # {'label': 'data', 'value': 'Data'}, # add more x-axis # {'label': 'data', 'value': 'Data'}, # add more x-axis ], value='Model', style={"width": "50%"} ), ]), html.Div([ html.Br(), html.Label(['Y-axis:'], style={'font-weight': 'bold'}), dcc.RadioItems( id='yaxis_raditem', options=[ {'label': 'R2', 'value': 'R2'}, {'label': 'Accuracy', 'value': 'Accruacy'}, # {'label': 'data', 'value': 'Data'}, # add more y-axis # {'label': 'data', 'value': 'Data'}, # add more y-axis # {'label': 'data', 'value': 'Data'}, # add more y-axis # {'label': 'data', 'value': 'Data'}, # add more y-axis # {'label': 'data', 'value': 'Data'}, # add more y-axis # {'label': 'data', 'value': 'Data'}, # add more y-axis ], value='Accruacy', style={"width": "50%"} ), ]), html.Div([ dcc.Graph(id='the_graph') ]), ]) #------------------------------------------------------------------------------------- @app.callback( Output(component_id='the_graph', component_property='figure'), [Input(component_id='xaxis_raditem', component_property='value'), Input(component_id='yaxis_raditem', component_property='value')] ) def update_graph(x_axis, y_axis): dff = modeln # print(dff[[x_axis,y_axis]][:1]) barchart=px.bar( data_frame=dff, x=x_axis, y=y_axis, title=y_axis+': by '+x_axis, # facet_col='Borough', # color='Borough', # barmode='group', ) barchart.update_layout(xaxis={'categoryorder':'total ascending'}, title={'xanchor':'center', 'yanchor': 'top', 'y':0.9,'x':0.5,}) return (barchart) if __name__ == '__main__': app.run_server(mode='inline')
In [ ]:
Copied!

Previous Next

Built with MkDocs using a theme provided by Read the Docs.
« Previous Next »